iPAS Exam Preparation Notes - AI Application Planner

TLDR

AI Fundamentals: AI, ML, and DL share a nested relationship; current mainstream commercial AI belongs to "Narrow AI."
Data Engineering: Data Lakehouse combines the flexibility of a data lake with the governance capabilities of a data warehouse; the Medallion Architecture (Bronze/Silver/Gold) is the standard model for tiered data management.
Data Processing: ELT is gradually replacing ETL to preserve raw data details for AI training.
Data Governance: Data Mesh addresses the scaling bottlenecks of centralized platforms through domain-oriented ownership.
Feature Engineering: Categorical feature encoding selection depends on cardinality and model type (One-Hot, Target, WoE, etc.); numerical features require normalization (Z-score, Robust Scaling).
Model Evaluation: For class imbalance issues, prioritize metric selection (F1, AUC, MCC) and decision threshold adjustment rather than relying solely on Accuracy.
Deep Learning: The Transformer architecture is the cornerstone of modern NLP; CNNs excel at spatial image features; Diffusion Models are the mainstream for image generation.
AI Governance: The EU AI Act adopts risk-based management; AI systems must ensure fairness, explainability, and security, utilizing Model Cards and Datasheets for transparency.

AI Fundamental Concepts

AI Capability Levels and Classification

Artificial Intelligence refers to technologies that enable machines to simulate human intelligent behavior. Current commercial AI (e.g., ChatGPT, AlphaGo) belongs to "Narrow AI," characterized by:

No Autonomous Goal Setting: Can only respond to prompts or external tasks.
No Persistent Memory: Does not autonomously accumulate experience after a conversation ends.
Limited Cross-Domain Transfer: Performance relies on massive training data and post-training processes.

AI functions can be categorized into: Analytical, Predictive, Generative, and Prescriptive (recommending the best course of action).

AI, Machine Learning, and Deep Learning

The three share a nested relationship:

AI: Any technology that allows machines to exhibit intelligent behavior.
ML: Learning patterns automatically through data without explicit rule programming.
DL: Using multi-layered neural networks to automatically extract features.

Data Engineering

Data Storage Architecture

Data Warehouse: Structured data, Schema-on-Write, suitable for reporting.
Data Lake: Raw data, Schema-on-Read, suitable for exploration.
Data Lakehouse: Combines both, supports ACID transactions and version tracking, suitable for reporting, ML, and RAG.

Medallion Architecture

Bronze: Raw data, maintained in its original form.
Silver: Cleaned and standardized, common across business units.
Gold: Business consumption layer, pre-calculated datasets.

Data Governance

Data Mesh: Decentralizes data ownership to business domains, managed through self-service infrastructure and federated governance.
Data Catalog/Metadata/Lineage: Solves the problems of "findability," "understandability," and "traceability" of data, respectively.

Feature Engineering

Categorical Feature Encoding Selection

One-Hot: Suitable for features with few categories and no inherent order (tree models).
Ordinal: Suitable for features with a clear order (e.g., education level).
Target Encoding: Suitable for high-cardinality features, but requires precautions against Data Leakage.
WoE: Standard practice for binary classification in the financial sector.
Feature Hashing: Suitable for streaming data or memory-constrained scenarios.

Data Quality and Imbalance Handling

Six Dimensions of Data Quality: Accuracy, Completeness, Consistency, Timeliness, Uniqueness, Validity.
Imbalance Handling:
- SMOTE: Suitable for numerical features, generates synthetic samples by interpolating between minority class samples.
- Decision Threshold Adjustment: Adjusted post-training, most cost-effective.
- Anomaly Detection: When class ratios are extreme (e.g., 99.99:0.01), use Isolation Forest or One-Class SVM.

Machine Learning Algorithms

Supervised Learning

Linear Models: Logistic Regression outputs probabilities, suitable for binary classification.
Decision Trees: Make predictions via split rules; high explainability, but single trees are prone to overfitting.
SVM: Finds decision boundaries via Maximum Margin, suitable for high-dimensional, small-sample data.
Ensemble Learning:
- Bagging (Random Forest): Reduces Variance.
- Boosting (XGBoost, LightGBM, CatBoost): Reduces Bias, improves predictive power.

Unsupervised Learning

K-Means: Spherical clustering, requires pre-specifying the K value.
DBSCAN: Density-based clustering, automatically identifies noise points, no need to specify the number of clusters.

Deep Learning and Model Architecture

CNN: Convolutional layers extract local features, suitable for image processing.
RNN/LSTM: Processes sequential data; LSTM uses gating mechanisms to solve the vanishing gradient problem.
Transformer: Based on the Self-Attention mechanism, supports parallel computing, and is the foundation of modern LLMs.
Diffusion Model: Generates high-quality images through a reverse denoising process.

AI Governance and Security

AI Governance Framework

EU AI Act: A risk-based management framework that prohibits unacceptable risks and strictly regulates high-risk AI.
NIST AI RMF: Provides a process language for risk management (Govern, Map, Measure, Manage).
ISO/IEC 42001: International standard for AI management systems, emphasizing accountability and continuous improvement.

Security Protection

Prompt Injection: Defense focuses on isolating instructions from data.
Privacy Protection: Uses Differential Privacy to inject noise or Federated Learning to ensure raw data never leaves the local environment.
Explainability (XAI): SHAP and LIME are the mainstream tools for post-hoc explanation of black-box models.

Change Log: 2026-05-20 Initial document created.

iPAS Exam Preparation Notes - AI Application Planner ​

TLDR ​

AI Fundamental Concepts ​

AI Capability Levels and Classification ​

AI, Machine Learning, and Deep Learning ​

Data Engineering ​

Data Storage Architecture ​

Medallion Architecture ​

Data Governance ​

Feature Engineering ​

Categorical Feature Encoding Selection ​

Data Quality and Imbalance Handling ​

Machine Learning Algorithms ​

Supervised Learning ​

Unsupervised Learning ​

Deep Learning and Model Architecture ​

AI Governance and Security ​

AI Governance Framework ​

Security Protection ​